BEST AVAILABLE COPY 



Document FP4 
Appl. No,: 10/791,681 



(12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(19) World Intellectual Property 
Organization 
International Bureau 

(43) International Publication Date 
23 September 2004 (23.09.2004) 




PCT 



inioiiiioii 



(10) International Publication Number 

WO 2004/081841 Al 



(51) International Patent Classification 7 : G06F 19/00 

(21) International Application Number: 

PCT/US2O03/OO7366 

(22) International Filing Date: 11 March 2003 (tl.03.20Q3) 

(25) Filing Language: English 

(26) Publication Language: English 

(71) Applicant: S ARNOFF CORPORATION [US/US] ; 201 
Washington Road, CN 5300, Princeton, NJ 08543 (US); 

(72) Inventor: GUARNIERI, Frank; 1742 West Eleventh 
: Street, Brooklyn, NY 1 1223 (US). 

(74) Agents: GARRETT, Patrick, E. et al ; Sterne, Kessler, 
Goldstein & Fox P.L.L.C., 1 100 New York Avenue, N.W. 
Suite 600, Washington, DC 20005-3934 (US). 



(81) Designated States (national)-. AE, AG, AL, AM, AT, AU, 
AZ, BA, BB, BG, BR, BY, BZ, CA, CH, CN, CO, CR, CU, 
CZ, DE, DK, DM, DZ, EC, EE, ES, FI.GB, GD, GE, GH, 
GM, HR, HU, ID, IL, IN, IS, JP, KE, KG, KP, KR^KZ^LC, 
LK, LR^ LS, LT, LU, LV, MA, MD, MG, MK, .MN, MW, 
MX, MZ, NO, NZ;, OM, PH, PL, PT, RO, RU, SC, SD, SB, 
SG, SK, SL, TJ, TM, TN, TR, TT, TZ, UA, UG, UZ, VC, 
VN, YU t ZA, ZM, ZW. 

(84) Designated States (regional): ARIPO patent (GH, GM, 
KE, LS, MW, MZ, SD, SL, SZ, TZ, UG, ZM, ZW), 
Eurasian patent (AM, ^ B Y, KG KZ, MD, RU. TJ, TM), 
European patent (AT, BE- BG; CH, C Y; CZ, DE, DK, EE, 
ES, FT, FR, GB, GR, HU, M TT, LU, MC, NL, PT, RO, 
SE, SI, SK, TR), OAPI patent (BF, BJ, CF, CG, CI, CM, 
GA,GN,GQ, GW, ML, !^, NE, SN, TT), TG). 

Published: 

— with international search report 

For two-letter codes and other abbreviations, refer to the "Guid- 
ance Notes on Codes and Abbreviations" appearing at the begins 
nihg of each regular issue of the P/CT Gazette. 



Tf (54) Title: COMPUTATIONAL PROTEIN PROBING TO IDENTIFY BINDING SITES 

00 

00 - : (57) Abstract: Provided is a computer implemented method of analyzing a macromolecule for potential binding sites, comprising: 
© positioning an instance of a computer representation of an organic fragment at a plurality of potential binding sites of the macro- 

molecule; selecting a value of fl, wherein B = p'/kT + ln<N>, where//' is the excess chemical potential, is Boltzmanns constant, 
© T is the! absolute temperature, and <N> is the mean number of molecules of the organic fragment; repositioning the instances of 

the organic fragment until a minimized energy state is obtained; assessing, for each instance of the repositioned organic fragment, 

whether the repositioned organic fragment binds to the macromolecule at the associated potential binding site at the selected value 
£j of B\ deleting instances of the organic fragment that do not bind at the associated potential binding site at the Selected value of B\ 

repeating these steps at a lesser value of B\ and outputting a list of undeleted instances of the organic fragment; provided that the 

organic fragment is not water. 



COMPUTATIONAL PROTEIN PROBING TO IDENTIFY BINDING SITES 



BACKGROUND OF THE INVENTION 
Field of the Invention 

5 The present invention relates to methods of identilying binding sites on 

proteins, methods for identifying classes of compounds Mtable for b a 
protein, and methods of conducting experiments to identi^ 
interact with a protein to affect a biological process. 

10 '. • • ' Related Art 

Determinations of protein structures have to date been conducted by 
isolating crystals of the protein of interest, and an^yzing structure by Xnray 
crystallography. Typically, the protein has been; co^rystallized with a heavy 
metal component, or subjected to multiple co-ci^t^i2^d^, with the heavy 

15 metal providing a reference for solving the <^ data. 

With a determination of the structure of a protein, or the structure of 
another macromolecule having significant tertiary structure* such as a DNA or 
RNA, workers often seek to identify the binding sites that are or may be of 
significance to a biological process; such as an enzyme active site or a site for 

20 interacting with another macromolecule or with itself. ; Computational efforts 

have been focused on efforts to s^ple the surface of a molecule to find good 
fits with known binding agents. These methods have had modiest success, and 
are dependent on knowledge of (a) the structure of good binding agents and, 
often, (b) the function of the protein. A more traditional approach has sought 

25 to co-crystallize binding substances with the macromolecule to identify 

binding sites. With the binding site identified, educated guesses can be made 
as to new molecules that could bind the site. These educated guesses can 
guide synthetic methods, including combinatorial chemistry methods, to make 
and test new molecules. When such prospective binding agents prove 

30 effective binding agents, and possibly are also found effective in an 



-2- 



appropriate biological model, the structural correlations drawn from the results 
can be tied to information about the binding site to make still further 
inferences about the structure important to a biological function. This co- 
crystallization approach dependls on an initial knowledge of active agents, and 
5 is experimentally difficult arid time consuming. 

The present inventor has found a method pf W^ 
dim structural solution of a m^rdmolecule, the binding sites for 

molecules: The sttuctural solution used as the basis for the method can be 
derived from ciystallography, spectroscopic analys^^ such as NMR, 
10 computational derivations, or any other method of dbtenn^g the structure of 

a macromolecule. The method does riot require or typically use information 
on the function of the 

and instead depends purely on physical parameter Further, the method can 
be refined further to nairow the possible choices of binding sites and identify 
15 the functionalities, i.e., organic fragments or "ORFs^ 1 that effect interact 

with the binding site(s). The data obtained : for^ 

orientations of the furicti^ candidate binding agent, thereby 

providing a tool for searching chemical databases to identify cahd^ 
agents. Where the methods described herein identify more than one potential 
20 binding site, the data generated through these methods can be used to 

energetically rank die binding sites, and th^eby quantitatively determine 
w 

The computational melhod described here generates maps of binding 
site preferences that are nearly identical with maps produced by compiling 

25 data generated by traditional methods, but with one important difference — the 

experimentally produced data took many years to produce while the data 
produced as described herein can be produced in no more than a few weeks. 
The invention provides : an important development in unbiased simulation 
methods for predicting the character of agents that bind to biological 

30 macromolecules to affect the function of the macromolecules. 



SUMMARY OF THE INVENTION 

In one embodiment provided is a method of identifying binding sites 
on a macromolecule comprising: (a) for at least one organic fragment (ORF), 
conducting, at separate values of parameter B, two or more simulated 
annealing of chemical potential calculations using the ORF as the inserted 
solvent; and (b) comparing converged solutions from step (a) to identify firist 
lpcaitipns at which the relevant ORF is strongly bound, thereby identifying 
candidate sites for binding ligand molecules. In one preferred aspect, the 
method fiirther comprises: (c) identifying clusters of sites that strongly bind an 
ORF. In another preferred aspect, the method further comprises: 

(d) conducting steps (a) and (b) for each of two or more ORFs and identifying 
clusters where two or more distinct ORFs bind. Preferably, a cluster that 
binds three or more distinct ORFs is identified. The method can identify 

• further that contribute to the binding of bioactive agents by 

reducing the binding stringency in the vicinity of a cluster to further identify 
elements that would contribute to the binding of a bioactive agent. 

In another preferred aspect, the method further comprises: 

(e) conducting, at separate values a measure of chemical potential, two or 
more simulated annealing of chemical potential calculations using water as the 
inserted solvent; (f) comparing converged solutions from step (c) to identify 
locations at which water is strongly bound, thereby identifying locations on 
the protein which are ndt candidate sites for binding ligand molecules; and 
(g) identifying first locations that are not water locations. 

In still another preferred aspect, the simulated annealing of chemical 
potential calculations comprise multiple steps of sampling, and wherein in a 
number of steps of the sampling the ORFs position is changed by a small 
amount and the resulting new position is accepted or rejected based on the 
change in energy as aresult of the change attempted. 

Further provided is a method of identifying the chemical 
characteristics of compounds that bind a macromolecule comprising 



-4- 



examiniiig functionalities and relative orientations of the ORFs found in a 
cluster pursuant to the binding site identifying method outlined above. 

^o provided is a method of conducting combinatorial chemistry to 
identify compounds that interact with a macromolecule comprising: 
5 (a) identifying classes of reactants that are modeled by the functionalities of 

the ORiFs fo the binding site identifymg method of 

; mactdm^ synthetic protocol that calls for 

two or more synthetic procedures that react reagents of at least two of the 
classes id^Med in step (a); and (c) conducting the combinatorial synthetic 
10 prdtocpl to create candidate binding molecules. 

Further provided is a method of conducting a bioacti agent discovery 
process comprising: (a) from a group of established combinatorial synthetic 
protocols or collections of chemical compounds or pools of chemical 
compounds, identifying those members of the group that provide a high 
15 demity of compounds that meet for a macromolecule selection criteria 

: ide^tifi^ identifying method; and (b) conducting 

binding s or functional assays to identify compounds obtained from the 
identified collections or protocols which bind or affect the function of the 
macromolecule. 

20 BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 A illustrates a solved crystal structure, while Figure IB 
delays the structure with a grid imposed. 

Figures 2A-2D display the method of the invention applied to the 
crystallographic solution of elastase; the method can be exemplified using 
25 methanol as the ORF. 

Figures 3 A and 3B show the combined results for several ORFs bound 
to elastkse after simulations at relatively low B values, with the results in 
Figure 3B filtered to identify clusters of these bound ORFs. 



Figure 3C shows the two clusters of Figure 3B which remain after 
excluding istrdng wateribinding sites, and Figure 3D shows the one cluster that 
remains after extending: the analysis to another ORF; Figure 3E shows the 
analysis extended to still a ^ further ORF. 

The panels of Figure 3F compare the simulation results to a co- 
crystallography result. 

Illustrated in Figure 4A are the amide binding sites extracted from the 
data of six co-crystallization experiments with elastase and known ligahds; 
md illustrated in Figure 4B is a cluster of the highest affinity amide binding 
sites determined by the simulation 

IUiistrated in Figure 4C are the amide ORFs of Figure 4B plus amides 
which are in the vicinity of the cluster but which appear in the simulation at 
second highest affinity binding values. 

In Figures 5A and 5B, solutions obtained with coKJr^tals of elastase 
inhibitors are compared with data obtained by the methods herein described. 

Figures 6A and 6B show the surfaces of elastase involved in binding 
ligands as indicated by the crystallographic data, Figure 6A, and as indicated 
by the solutions obtained using the method described herein, Figure 6B. 

Figiue 7 shows a schematic illustration of the type of titrations for 
water binding to a macromolecule that can be used to help identify a level of 
relatively strong water binding. 

DETAILED DESCRIPTION OF THE INVENTION 

Definitions 

The following terms shall have, for the purposes of this application, the 
respective meanings set forth below. 

"Bioactive agent" refers to a substance such as a cheroical that can act 
on a cell, virus, organ or organism, including but not limited to drugs (i.e., 
pharmaceuticals) to create a change in the functioning of the cell, virus, organ 
or organism; In a preferred embodiment of the invention, the method of 



idetitifying bioactive agents of the invention is applied to organic molecules 
hiving molecular weight of about 600 or less or to polymeric species such as 
peptides, proteins, nucleic acids, proteoglycans and the like. A bioactive agent 
can be a medicament, i.e. a substance used in therapy of an animal, preferably 
a; human. 

• "Cluster of free grid points" refers to free grid points that are within a 
"cluster" in that/relative to a given ORF, there is a sufficient niiniber of 
nearby or adjacent free grid points to allow a reasonable probability that -the 
ORF could be inserted at the cluster. Thus, the cluster of free grid points for 
Hfe0 must be defined to identify all volumes at the surface or interior of a 
riiacromolecule that could accommodate H 2 0— though the selection criteria 
should err to identifying some volumes that do not accommodate H 2 Oj as 
needed to assure that all appropriate volumes are sampled in the simulation 
process. A cluster of free grid points is defined differently depending on the 
size of the ORFs (e.g., compare H 2 0 and benzene) and the spacing of the grid. 

A "cluster of ORF binding sites" typically refers to a pattern of closely 
located or superimposed sites that bind ORFs with sufficient affinity to merit 
further consideration. 

"Collection of chemical compounds" refers to any collection of 
compounds collected or organized with the intention that they can be 
examined to identify bioactive agents (e.g., having a biological activity 
measured directly or through a surrogate for biological activity such as 
binding to a macromolecule or interfering with a function of a 
macromolecule). The collection can be prepared from a collection of simpler 
molecules (which can be bound to a support) by a chemical scheme designed 
to generate a diversity of chemicals. Collections of this latter type are often 
referred to as "combinatorial libraries." 

"Free grid points" refers to grid points (which are discussed below) 
which are, for a given accepted definition of atomic radius, "free" in that they 
do not fall within the atomic radii of the mapped atoms of the relevant 
macromolecule. 



-7- 



,, Macromolecule M refers to a molecule or collection of molecules which 
has a tinie-avemg^ tertiary structure. Thus, while the te^ refers to 

proteins, ribonucleic acids, structures formed of both nucleic acid and protein; 
carbohydrates, structures formed of two or more of the aforementioned^ and 
5 the like, it can also refer to structures formed with other molecules including 

lipids. Macromolecules are used in the method described herein with 
■ reference to maps of their tertiary structure; Such maps are typically 
generated by X-ray diffraction studies, which have generated maps for 
thousands of macromolecules. However, maps can be produced by other 
10 methods such as computational methods or computational methods 

supplemented by other data such as NMR data. While computational methods 
have been difficult to apply, recent studies appear to have achieved some 
successes. 

"Organic fragments" or "ORFs" are molecules or molecular fragments 

15 that can be Used to model one or more mode* of interaction \vith a 

macromoleciile, such as the interactions of carbonyls, hydroxyls, amides, 
hydrocarbons, ahd the like. 

"WatCT locatioxis" are locations at which water is strongly bound, 
meaning, in one embodiment, for example locations where the simulation 

20 inchoates water remains bound when the simulation is run at values of B that 

are equal to or tess than the B value for the transition point indicating those 
w^er molecules that are strongly influenced by the macromolecule. 
Illustrated in Figure 7 is a conceptualization of the titration of simulated bound 
water molecules with decreasing values of 5, a parameter described further 

25 below. A transition point indicates water molecules that are strongly 

influenced by the macromolecule. A B value less than or equal to that at the 
transition point can be designated as defining water binding of sufficient 
strength to render competitive binding by another molecule unlikely, as 
illustrated by point SB in the illustration. Typically, for a water soluble 

30 protein, this point SB is selected so that about 100 to about 50 water molecules 

remain bound for a 50 kd protein. 



The simulation process of the present invention works by artificially 
inserting a given ORF at an unbiased sampling of all the sites oh or within a 
macromolecule structure where such ORF can, as a practical matter, reside. 
These sites can be termed the "sampling sites." Typically, a schedule of 
simulations for each of a number of ORFs are run, with each simulation run at 
a separate value of a parameter 5, which is related to the excess chemical 
potential. The schedule provides for simulations conducted at each of a 
number of B values, typically ranging fiom 10 to about 4;5. 'In each 
simulation at a given value of j?, the simulation assesses at each step of the 
simulation whether the insertion of the ORF at a given site shall be accepted or 
rejected, with the assessment based on a grand canonical ensemble probability 
density function. At each step of the simulation, the algorithm models the 
insertion of the ORF at the site. A forced bias canonical probability density 
function is used to translate and rotate the ORF in small steps (e.g. , ±&2A, 
±30°) to identify an energy minimized insertion given the simulation 
parameters in place at the time of the simulation step. The probability of the 
insertion is then determined from the grand canonical ensemble probability 
density function, arid the ORF can be represented as resident at the site by a 
random number generating protocol weighed to the probability value. 
Alternative methods for choosing to make this representation, such as 
applying cutoff values for when to represent the insertion or not, can also be 
applied, but are less favored. Typically, following a successful insertion, the 
subsequent deletion attempts at the site are with the previously identified 
translated and rotated ORF, and this translated and rotated ORF is used until a 
deletion attempt succeeds. The simulation is typically conducted for a large 
number of steps, such as 2 x 10 6 steps, with the majority of the steps, e.g., 
1.5 x 10 6 required to "equilibrate" the simulation so that the number of 
accepted insertions is equal to the number of deletions on average. 

By taking a large number of unbiased samplings at each sample site 
over the latter course of the simulation, such as after every 200 steps of 
iterations after equilibrium is achieved, an occupation probability of the ORF 



residing at that sample site at the given value of B can be assessed. The 
occupancy as an overall result of the method can then be determined based on 
tins probabih^, for exanqile with a random number protocol making the 
representation based on its probability. The degree to which the ORF is 
5 translated or rotated can also be represented based on the probability of such 

translations and rotations. 

For each GRF, simulations are run at each of a number of 
measure of excess chemical potential, such as J?. Thus, as this value lowers, 
the retention of an ORF at a given sampling site is an indication of high 

10 relative binding ajQGra 

The sampling sites are typically arrived at by creating a grid as 
illustrated in Figure 1. Figure 1 illustrates a solved crystal structure 
(Figure 1 A) bn which a grid is imposed (Figure 113). For example, the grid 
can have about ^ to about 1 A spacing; with the grid intersection points 

15 definmg ffie candidates for sampling sites. The spacing of the grid; is 

preferably selectbd to be less than the sm^ cross-section of the ORF. The 
spacing is typically selected to be small enough in relation to the size of the 
ORF so that the probability that free volumes tha^t could define free grid point 
clusters have sufficient free grid points to allow useful sampling as described 

20 below. Such relatively small spacing minimizes the chance that the selection 

of how to orient the algorithm against identifying certain 

ORF binding preferences. The sampling sites are selected from sites that are 
unoccupied by the macromolecule (Figure IB). A final elimination of "grid 
bias" is achieved by varying the test insertion points away from strict initial 

25 insertion at grid points, as described below. 

The sampling sites are limited to those sites having enough adjacent 
volume free of the macromolecule to allow the ORF to be inserted. For 
example, the sampling sites can be limited to grid points within an open area 
of at least about 2A x 2A x 2A (= 8A 3 or 0.008nin 3 or about 2. 5 A x 2.5 A x 

30 2.5A (= 15.6A 3 or 0.0156nm 3 ) or, for water, about 2.2A x 2.2A x 2.8A. The 

grid points can be selected for those free grid points that are within a cluster of 



free grid points; such ^ a cluster of 3, 4, 5, 6, 7, 8 or more free 

grid points, depending on the size of the ORF and the spacings of the grid. 

In one preferred embodiment, the ORF is not necessarily initially 
inserted exactly at the grid points, but instead at a random samphng of 
insertion points wthin a short distance of the grid points, such as points within 
a sphere shape centered at the grid point and having a diameter of about some 
percentage, such ais 10%, of the grid spacing, or within a box shape centered at 
the grid point having wid^ height of about such a percentage of 

the grid distance. As discussed above, this "wobble" in the initial insertion 
point helps eliminiate grid bias where the placement of the £rid happens to 
reduce the chance will be efficiently sampled. 

Using the a^tallogrq)hic solution of elastase, in particular, the pig 
pancrease elastase sfructufal solution of G.A. Petsko of Brandeis University, 
the meth^ can be bxamph methanol as the ORF. Figure 2 A shows 

the final solution using a relatively high B value, e.g., 5 = 10. Figure 2B 
shows the final solution using an intermediate value, e g., B = 6 or 7: Figure 
2C shows the final solution using a lower intermediate value, e.g., 5 = 0 or -2 
or -4. Figure 2D shows the final solution using a restrictive value, such as B = 
-14. As illustrated, with lower values of B less and less methanol molecules 
remain bound. These renimnirig methanol fragments indicate those that bind 
with relatively high affiiii^. 

The next step of the process is to conduct simulations with additional 
ORFs and identify clusters of relatively high affinity ORF binding sites. Thus, 
for example, agaih using elastase, simulations can be conducted to determine 
binding for ORFs for ammonia, methanol, ketone and amide. Combined 
results at relatively low B values are illustrated in Figure 3 A. Clusters of ORF 
binding sites are identified in Figure 3B. The method of the present invention 
seeks to identify clusters of ORF binding sites, where the clusters can be made 
up solely of one type 6f ORF. Preferably, however, the cluster will include 
binding sites for 2, 3, 4, 5 , 6, 7 or more distinct ORFs. 



-11- 



Examples of useful ORFs include: 



Name 


Structure 


Acetone 


CH 3 (O=0)CH3 


Aldehyde 


H(C=0)-CH 3 


Amide 


H(C=0)NH 2 


Ammonia 


NH 3 


Benzene 








Carboxylic Acid 


CH 3 COOH 


1,4-Diazine 








Ester 


CH 3 -0-(C=0)-CH3 


Jbtner . : 


CH3-O-CH3 


Formaldehyde 


H 2 O0 


Furan 




0 


Imidazole 




Methane 




Methanol 


CH3OH 


Phospho-Acid 


© 

1 

HO— P— OH 
OH 



Name 


Structure 


Pyridine 






Pyrimidine 






Pyrrole 




Thiol 


CHjSH 


Thiophene 





Preferably, the ORFs selected are representative of chemical features that have 
provOT usefol in the design of pharmaceuticals or other bioactive chemicals. 

Thus, in a first mode of analysis, an important part of the process is to 
run the simulations with several ORFs, identifying clusters of sites that bind 
multiple ORFs with relatively high affinity. These clusters are strong 
candidate sites for ligand binding sites. Moreover, the relative positioning of 
the QRFs is instructive of the features of good binding agents. For example, at 
the binding site identified on elastase by the methods described below, a 
cluster having two benzene rings with an amide interposed between them 
models some of the strongest elastase inhibitors derived from an extensive 
research program, which inhibitors have a sulfonamide in place of the carbon- 
based amide of the simulation. See, Tables XXIII and XXV of Edwards et ai 9 
Med. Res: Rev. 14:127^94 (1994). 

In sonie implementations of the invention, clusters of ORF binding 
sites alone will identify, or substantially narrow the range of choices for, the 



-13- 



sites at which iigarids interact with a given protein. However, in some 
embodiments of the invention^ the sites that bind water strongly are identified, 
and the clusters that intersect with strong water binding sites are discounted. 
Thus, in the elastase example, the candidate ligand binding sites of Figure 3B 
5 are narrowed by excluding water binding sites, as illustrated in Figure 3G. If 

the analysis is extended to five ORFs as illustrated in Figure 3D, a single 
candidate site remains. Figure 3E shows a slightly different perspective of tiie 
same site illustrated in Figure 3T) 9 with the analysis extended to six ORFs. 
Figure 3F shows how well the candidate site (left panel) matches up with the 
10 structure of a co^ciystal containing the ligand tiifluoroacetyl-lysyl-prolyl-^ 

isopropylanilide. 

Accordingly, in a second mode of analysis, an optional step in the 
process is to narrow the choices for ligand binding sites by excluding ORF 
clusters that intersect with relatively strong water binding sites. 

15 It should be noted that clusters of ORFs are typically identified at 

relatively low B values* thereby helping to identify prospective binding sites 
for ligands. However, further information about prospective binding sites can 
be gleaned by looldng, in the vicinity of a prospective binding site, at more 
. weakly binding ORFs. This information value flows from the prospect of 

20 more weakly binding ORFs modeling a ligand interaction which, while weak 

in isolation, models a real contribution to ligand binding affinity of a bibactive 
agent as a whole, fllustrated in Figure 4A are the amide binding sites 
extracted from the data of six co-crystallization experiments with elastase arid 
known ligands. Illustrated in Figure 4B is a cluster of tiie highest affinity 

25 amide binding sites determined by simulation. Illustrated in Figure 4G are the 

amide ORFs of Figure 4B plus amides which are in the vicinity of the cluster 
but which appear in the simulation at the second highest affinity values. As 
illustrated, this last step of expanding the results by looking at neighboring 
lower affinity ORF binding sites helps to better model the results seen in co- 

30 crystallography. Specifically, the cluster results identify the site at which the 

majority of amide, binding sites are seen in crystallography, but the expansion 



- 14 - 



extends the results to another cleft in elastase where amides have been 
experimentally located Additionally, the expansion identifies part of another 
cleft at which ligand interactions are: seen (as will be illustrated in other 
Figures). 

5 Thus, in a third mode of analysis, the features of ligand binding sites 

indicated by other modes of analysis are expanded upon by looking to less 
stringent simulation results in the yicinity^ The above 

illustration focused on a cluster of one type of ORF, but is applicable with 
clusters of many types of O where the expansions can be limited to one 
10 type of ORF or multiple types pf ORFs; 

The data in Figures 4A-4C illustrate an important concept. Both in 
actual ligand bindings and in the simulations, multiple effective binding 
locations and orientations for a given type of moiety can be found to overlap. 
This reflects the existence of m^ 
: 15 actions, rather than lo^ by crystallography, 

binding interactions wiU reflect a minima. 
In Figures 5A ^ 

inhibitors are compared with data obtained by the methods herein described. 
In Figure 5 A, the solutions for six co-crystallized inhibitors are shown, with 

20 the inhibitor molecules overlaid on each other (non-space-filling 

representation, with the elastase segment represented by a space-filling 
illustration). These ii^bitots are trifluoroacetyl-l-lysyl-l-prolylr 
p-isopropylanilide (crystal solution: Mattps et aL, as submitted April 30,, 
1994), trifluoroacetyl-l-lysyl-l-leucyl-jp-isopropylaniUde (crystal solution: 

25 Mattos et aL, as submitted June 22, 1994), trifluoroacetyl-l-phenylalanyl- 

p-isopropylanilide (crystal solution: Mattos et aL, as submitted April 30, 
1994), trifluoroacetyl-l-phenylalari 

solution: Mattos et aL, as submitted February 14, 1995), trifluoroacetyl-1- 
valyl-l-alanyl-p-trifluoromethylanilide (ci^ solution: Mattos et aL, as 
30 submitted Febniary 14, 1995); a^ 

nitrobenzoyl) hydroxylamine (crystal solution: Ding et aL, as submitted 



15 - 



July 10, 1995). In Figure 5B, the solutions for approximately 1 0 ORFs, which 
are in their Both 
methods identify a region which favors the binding of aromatic moieties. The 
simulation process achieves approximately 90% 3D geometric identity with 
5 the crystallography results. 

Figures 6 A and 6B show the regions of elastase involved in binding 
ligandis as indicated by the ^c^ 

by the solutions obtained fixtm the computational method described herein, 
■ -Figure 6B. 

10 The simulations; of the Carlo algorithm. The 

form of Monte Carlb simulation ra^ 

Frenkel and Smit, Understanding Molecular Simulation: From Algorithms to 
ApplicatioWs^ The simulation method can 

comprise: 

15 Locate a numeric representetion of the macromolecule in a periodic 

cell- _ 

Optimize the position of the macrdmolecule in the cell. 
lx>c whether interior of 

surface cavities. 

20 Insert and delete the ORFs (iiicludiiig water) in these cavities. 

Compute the pfbbabihties 
canonical ensemble pn&abil^ 

Vary the chemical potential yielding relative free 

The methodology/ gi^d simulation, can be 

25 intiroduced as follows: 

Grand-Cahohical Ensemble Simulations 

The distinguishmg feato simulations in the grand-canonical 
ensemble is the change in the nuxnber of molecules (ORFs) in the system 
30 during the simulation. In other words, the sampling is not restricted to the 



- 16 - 



configuration space of ia given dimension but it has to be extended to a set of 
configin^on spaced ^ found, unexpectedly, that the complexity 

of allowing for these ^ of molecules and the resulting 

changing mass nonetheless s makes the simulation computationally extremely 
5 more efficient The chiarige in the number of molecules corresponds to the fact 

that the grand^anonicjai partition function S is the linear combination of the 
corresponding canonical partition functions of a different number, N, of 
molecules, Q: 

E(T,V, M )=±^"' kT) Q(T,V,N) (1) 

10 where f is the absolute ten^erature, p is the chemical potential, k is the 

Boltzmann constaint, and Q(T,V,N) is defined by: 

^ (2) 

with q being the molecular potion fra 
The saropMg of ^ 

15 been shown to be feasible using Metropolis Monte Carlo methods where in 

each step of the sampling a molecule's (OKFs) position is changed by a small 
amount and the resulting new ^ w accepted or rejected based on 

the change in ener^ ^ of the change attempted. This position 

shifting can be thoii^t of as effecting a "shaking" of the ORF to identify its 

20 favored positioning, and the "shaking" methodology, which can be biased in 

the direction of the forces can be termed "forced bias Monte Carlo." When 
this shaking is applied, the simulation solutions reflect higher probability 
orientations. Accordingly: 

^ (3) 

25 Notice that the temperature (kept constant during the simulation) enters the 

acceptance formula as a scaling factor of the energy.change. 

Generalizing the canomcal ensemble Metropolis method to 
in the; grand-cmomcal ensemble calls for steps where the number of molecules 
(ORFs) changes. Operationally, this requires either the deletion of an existing 



molecule or the "creation", i.e.* insertion of a new one. It has been shown that 
when the deleted molecule is chosen randomly, then the deletion attempt 
should be accepted wiffi the following probability : 

P£ = i^ (4) 

where 

B-M'/kT+ln{N) (5) 

with fi' being the excess chemical potential, N the number of molecules 
(ORFs), <A> its Boltzmann average and V the volume of the system (which is 
a! constant during the simulation); Attempts to insert a molecule (ORF) at a 
random location is accepted with the following probability: 



pace _ 
r ins ~ 



min^l,exp(~A£/fcr + 5)-^j (6) 



Here the effect of the chemical potential is introduced into the acceptance 
expression via the B parameter. The presence of the factors V and N follows 
from the relation between the canonical and grand-canonical partition 

15 functions: when a molecule (ORF) is taken out of the system, the integration 

over its coordinates (in Qn) will yield a V factor and N is the last factor of NL 
They can also be given a probabilistic interpretation: the insertion site will be 
chosen with probability 1/V and the molecule (ORF) to be deleted will be 
chosen with probability 1/N. 

20 The simulation proceeds by alternating attempts to move, insert and 

delete molecules (ORFs) a:nd accepting them with probabilities PJZe , P£\ 
P£f , as defined Equations (3-5) above. After sufficiently long runs, the 
number of molecules (ORFs) N will fluctuate around its Boltzmann average 
<N>. If a given density has to be simulated then it is generally necessary to 

25 try different B values. In this regard, it is useful to note toe following 

relationship: 



(•^V..- (Nf (7) 



-18- 



This method has been found useful in simulating atomic fluids at 
moderate densities but rims into difficulties when room-temperature liquids 
are simulated The difficulty stems from the fact that most insertion attempts 
will be at positions where there already is a molecule (e.g., from the solved 
protein stracture) resulting in a large &E', and the resulting probable rejection 



To indreasie the efficiency of insertion attempts, a cavity-biased 
insertion technique was introduced. Insertions are attempted only at sites 
where a cavity of suitable size already exists, thereby ensuring a non- 
10 negligible probability of acceptance. However, to ensure that the simulation 
thus modified ^11 produces the required B6lt2mann distrib\ition, both the 
insertion and deletion acceptance probabilities have to be modified. The 
modified expression involves the probability of finding a cavity when there 
are Nmolecuies (ORFs) present, , which follows from: 



=minj\,< 



15 p£ = minl i,exp(-A#/^+#^ 



N+l 



(8) 



^ (9) 

In Equations 8 and 9, P™V represents the volume of the regions of the 

system that contain cavities of suitable size. The efficiency of the cavity- 
biased method follows from the fact that the algorithm searching for cavities 
20 also yields without extra steps. Calculations on a variety of fluids (water, 

benzene) j which define ORFs, have confirmed thai the cavity biased method 
significantly increases the efficiency of insertion attempts and allows 
modeling of densities that proved to be impractical without this improvement. 

25 Water binding 



Aspects of the simulations used in the invention can be illustrated with 
calculations used to determine the strength of water binding to a synthetic 



polynucleotide (Guarnieri and Mezei, J. Am. Ghent. Soc. 118: 8493-8494 
(1996)) v This illustrate 

This text illustrates how the method of simulated annealing of 
chemical potential aUdws bulk waters to be distinguished from bound waters, 
and how differentially bound waters may be distinguished from each other 
based on their relative chemical potentials/ This is illustrated by showing that 
it takes more free energy to desolvate the minor groove than the major groove 
of a charged DNA dodecamer. 

Grand canonical ensemble simulations are generally performed by 
placing a molecule in a periodic simulation cell, setting a parameter B 9 which 
is representative of free energy, in such a way as to achieve an experimentally 
determined density, sampling potential hydration positions around the 
molecule by inserting and deleting water molecules from the simulation cell 
using a technique such as cavity-bias (Mezei 3 Mol Phys. 57:565-582 (1994); 
Resat and Mezei, J: M.CIim-Soc.l 16^451^7452 (1994)), and acceptirig or 
rejecting the attempt based on a Metropolis Monte Carlo (Metropolis et aL, J. 
Chem. Phys. 2/: 1087-1 092 (1953)) criteria using a grand canonical ensemble 
probability function (Tolmari, R., in T/ie Principles of Statistical Mechanics y 
Dover Press, New York (1971)). The parameter B is related to the excess 
chemical potential // y as follows: B "== ju'/kT + ln<N>, where k is Boltzmann's 
constant, T is the absolute temperature, and <N> is the mean number of 
molecules of the ORF, which here is H 2 0. In the method of simulated 
annealing of chemical potential^ the simulation is started with a large initial 
5-value so that a higher percentage of water insertion attempts are accepted. 
This causes the simulation cell to be flooded with water molecules. After this 
grand canonical ensemble simulation at high excess chemical potential is 
equilibrated, subsequent simulations are carried out at successively lower 
B-values. This successive lowering of the B- values causes a gradual removal 
of the bulk water molecules from the simulation cell. As the chemical 
potential is further "annealed 11 , a point is reached at which water molecules do 
not readily leave the cell, thereby identifying those water molecules that are 



strongly influenced by the DNA, the so-called "bound water molecules". As 
the excess chemical potential is again lowered, ultimately some of these bound 
waters start to leave the cell. Since chemical potential is a free energy, this 
simulated annealing of chemical potentid yi^^ estimate of the 

differential free energy of binding of the different bound water molecules. It 
must be emphasized ^ applies strictly 

to the value of the chemical potential and that the temperature is kept constant 
at, for example, 298 K in all the simulations. For all simulations the DNA was 
held fixed, water molecules wore added and deleted throughout all parts of the 
cell, extensive canoiiicd Monte GaW^ was-pc^ormed 
canonical Monte Carlo steps, aiid periodic boundary conditions were used. 

As an illustration of the method^ a; simulated annealing of chemical 
potentiat or*: * d(CGCGAATTGGCG>2 was performed, starting with B = 1.0 
down to -26 in 37 increments performing 2,000,000 cavity-biased grand 
canonical ensemble Monte Garlo steps at each 5-value. The final 
configuration of the simulation with S ■ -6 , has 1 1 20 bound water molecules. 
The final configuration of the simulation with B - -8, has bound 533 water 
molecules. The final configuration of the simulation with B = -9, has 390 
bound water molecules. The final configuration of the simulation with B - 11, 
has 215 bound water molecules. The most salient feature of this progression is 
the differential hydration of the major and minor groove of the DNA. The J? = 
6 simulation shows the DNA essentially uniformly solvated. The B = -8 
simulation clearly shows that upon lowering of the chemical potential by 2 
B-units, a majority of the nonbulk extracted waters come from the major 
groove, while the minor groove remains almost unaffected. Annealing the 
chemical potential further {B = -9) still leaves the minor groove well hydrated 
while the major groove is almost stripped. Lowering B even further (B == -11) 
results in the removal of almost all water molecules from both the major and 
minor groove. Quantitation of the hydration of the DNA as a function of 
chemical potential was computed by proximity analysis (Mehrotra and 
Beveridge, 7. Am. Chem. Soc. 102:4287 (1980); the effects of different partial 



charges on proximity andysis are described in: Mezei, M>/ ; Simul. 7:327-332 
(1988)) with the results shown in Table 1: 



B 


First hydration shell 


First ; and second hydration shell 


Minor 


groove 


Major 


groove , 


: Minor 


groove 


Major 


groove 


No. of: 
waters 


density 


No of 

waters 


density 


No. of: 
waters 


density 


No. of 
waters 


density 


-6 


7.27 


0.013 


13.23 


0.012 


21.3 


0.021 


41.7 


0.011 


-8 


5.4 


0.010 


5.06 


0.004 


14.6 


0.015 


11.8 


0.003 


-9 


4.08 


0:007 


4.36 


0.004 




0:011 


9.7 


0.003 


-11 


1.04 


0.002 


2.11 


0.002 


3.9 


o:oo4 


;■ 4.2 


0.001 



5 For B = -6, the first hydration shell (defined by the position of the first 

minimum of the radial distribution -function) of the major and minor groove 
has a comparable density (0.012 and 6.013, respectively)^ ; while the second 
hydration shell of the minor groove has ^ the density of tiie major groove. 
For B = -8 the hydration difference become the minor 

10 groove first and second shell hydration density being 2:5 fold and 5 fold 

higher than the major groove, respectively. For 5 == -11 the major and minor 
groove hydration density again becomes equal because at this value of the 
excess chemical potential both grooves are essentially 
niustrating the differential hydration prope^^ 

15 minor grooves of DNA is computationally undemanding (3 days of CPU time 

to run one annealing schedule and 3 days of CPU 

analysis (Calculations of volume elements can be CPU intensive. The effects 
of volume element calculations on proximity analysis are described in Mezei 
and Beveridge in Methods in Enzymology, Packer, ed., Academic Press, New 

20 York (1986), pp. 21-47) on an SGI Power Challenge) using simulated 

annealing of chemical potential because only a coarse "cooling" schedule of 
the chemical potential is required. Since the chemical potential is a free 
energy, a very fine cooling schedule may be used to estimate quantitatively the 
hydration free energy difference of two different functional groups or even 

25 two different atoms of the DNA. Two atoms that desolvate at the same 

5-value have similar solvation free energy, or alternatively, require a finer 



■■23.- 



cooling schedule to resolve the differences. It should be noted that the model 
system used here consisted of ionic DNA with 22 negative charges and no 
sodium counterions. The findings presented herein about the preferential 
hydration of the minor groove corresponds very well to results from X-ray 
5 crystaUographic and NMR studies ■ Possible reasons for die stronger binding 

of water molecules in the minor g^^ include the following: the high 
density of the charged rows :-Q^h^p^dto groups* steric constraints, and 
specific water-water, water-DNA interactions. 

The regions where water binds tightly on a protein, are regions which 
10 are precluded from ORF binding; Thus, the remaining sites on the protein 

unoccupied by water are candidates for good OKF binding. 

Antagonists and Agonists— Assays acid: Molecules 

Candidate bioactive agents identified by the methods of the invention 

1 5 can be tested to assess their binding to the macromolecule in question. Where 

the macromolecules are r^o^ible for m including 
disease states, it is therefore desirable to devise screening methods to identify 
compounds which stimulate or which inhibit the function of the 
macromolecule. Accordingly, in a further aspect, the present invention 

20 provides for a method of screening compounds to identify those which 

stimulate or which inhibit the function of such a macromolecule. In general, 
agonists or antagonists can be employed: for therapeutic and prophylactic 
purposes for diseases. Compounds can be identified from a variety of sources, 
for example, cells, cell-free preparations, chemical Ubraries, and natural 

25 product mixtures. 

The screening methods can simply measure the binding of a candidate 
compound to the macromolecule, or to cells or membranes bearing the 
macromolecule. The macromolecule can be a variant of the macromolecule 
used in the simulation method, such as a fragment retaining the binding site 

30 identified in the simulation or a fusion protein used to make recombinant 



synthetic methods more practical. The screening method can involve 
competition with a labeled competitor. Further, these screening methods can 
test whether the candidate compound results in a signal generated by 
activation or inhibition of the macromolecule, using detection systems 
appropriate to the cells comprising the macromolecule. Inhibitors of 
activation are generally assayed in the presence of a known agonist and the 
effect on activation by the agonist by the presence of the candidate compound 
is observed. Further, the screening methods can simply comprise the steps of 
mixing a candidate compound with a solution containing a macromolecule, 
measuring macromolecule activity in the mixture, and comparing the activity 
of the mixture to a standard. 

The invention also provides a method of screening compounds to 
identify those which enhance (agonist) or block (antagonist) the action of 
macromdlecules, including association of tte macromolecule with itself or 
another macromolecule. The method of screening can involve high- 
throughput techniques. For example, to screen for agonists or antagonists, a 
synthetic reaction mix* a cellular (X>mpartment^ such as a membrane, cell 
envelope or cell wall, or a preparation 6f any thereof, comprising 
macromolecule and a labeled substrate or ligand of such polypeptide is 
incubated in the absence or the presence of a candidate molecule that can be 
an agonist or antagonist The ability of the candidate molecule to agonize or 
antagonize the macromolecule is reflected in decreased binding of the labeled 
ligand or decreased production of product from a substrate. Molecules that 
bind gratuitously, i.e., without inducing the effects of macromolecule are most 
likely to be good antagonists. Molecules that bind well and, as the case can 
be, increase for example the rate of product production from substrate, 
increase signal transduction, or increase chemical channel activity are 
agonists. Detection of the rate or level of, as the case can be, production of 
product from substrate, signal transduction, or chemical channel activity can 
be enhanced by using a reporter system. Reporter systems that can be useful 
in tins regard include but are not limited to colorimetric, labeled substrate 



converted into product* a reporter gene that is responsive to changes in 
macromolecule activity, and binding assays known in the art: 
All publications and references, including but not limited to pateiits and patent 
applications, cited m this specification are herein incorporated by reference in 
their entirety as if each individual publication or reference were specifically 
and individually indicated to be incorporated by reference: herein as being 
SfMy set forth. Any patent application to which this application claims priority 
is also incorporated by reference herem 
above for publications arid references. 

While this invention has been described with an emphasis upon 
preferred embodiments, it will be obvious to those of ordinary skill in the art 
ttat variations in the preferred devices and methods may be used and that it is 
intended that the invention may be practiced otherwise than as specifically 
described herein. Accordingly, this invention includes ^ 
encompassed within the spirit and scope of the invention as defined by the 
claims that follow. 



-25- 



10 



ipiATISCIAIMro 

1 . A method of identifying binding sites on a macromolecule comprising: 

(a) for at least one organic fragment (ORF), conducting, at 
separate values of parameter B, two or more simulated annealing of chemical 
potential calculations using the 6 

(b) comparing converged solutions from ^ep (a) to identify 
pst locations at whichthe relevant ORE is strongly bound, thereby identifying 
candidate sites for binding Ugand molecules. 

2. The method of claim 1 , further comprising: 

(c) identifying clusters of sites that strongly bind an ORF; 



3. The method of claim 2, further comprising: 

15 (d) conducting steps (a) arid (b) for each of two or more 

ORFs and identifying clusters where two or more distinct ORFs bind. 

4. The method of claim 3, wherein a cluster that binds three or more distinct 
ORFs is identified. 

20 ; '. 

5. The method of claim 3 , further comprising reducing the b 

in the vicinity of a cluster to further identify elements that would 
contribute to the binding of a t)ibactive agent. 

25 6. The method of claim 1, further comprising: 

(e) conducting, at separate values a measure of chemical 
potential, two or more simulated annealing of chemical potential calculations 
using water as the inserted solvent; 



(f) comparing converged solutions from step (c) to identify 
locations at which water is strongly bound, thereby identiJ^g water locations 
which are not candidate sites for binding ligand molecules; and 

(g) identifying first locations th^ 

7. The method of claim 1, wherein : the simulated^ ^ chemical 
potential calculations comprise multiple steps of sampling, and wherein in 
a number of steps ofthe sampling the ORFs position is changed by a small 
amount and the resulting new position is accepted or rejected based on the 
change in energy as a result of the change attemjpted. 

8. A method of identifying the chemical characteristics of compbunds that 
bind a macromolecule comprising examining the functionalities and 
relative orientations of the ORFs found in a cluster pursuant to the binding 
site identifying method of claim 3V 

9. A method of conducting ^ 

that interact with a macromolecule com^ 

(a) identifying classes of reactants that are modeled by the 
functionalities, of the ORFs found in a cluster pursuant fo the binding site 
identifying method of claim 3; 

(b) designing a combinatorial synthetic protocol that calls 
for two or more synthetic procedures that react reagents of at least two of the 
classes identified in step (a); and 

(c) conducting the combinatorial synthetic protocol to 
create candidate binding molecules. 

10. A computer implemented method of analyzing a macromolecule for 
potential binding sites, comprising: 

(1) positioning an instance of a computer presentation of 



an organic fragment at a plurality of potential binding sites of the 
macromolecule; 

(2) selecting a value of B, wherein \B:^^ri^^'-ht<^ 9 
where //' is the excess chemical potential is Bolt^ami's c^iistant, T is the 

5 absolute temperature, and <N> is the mean number of molecules of the 

organic fragment; 

(3) repositioning the instances of the organic fia^ent until 
a minimized energy state is obtained; 

(4) assessing, for each instance of the repb^ 

10 fragment, whether the repositioned organic fragment binds to the 

macromolecule at the associated potential binding site at the selected value 
ofj?; 

(5) deleting instances of the organic fragment that do not 
bind at the associated potential binding site at the selected value of 5; 

15 (6) repeating steps (1) through ^ value of B; 

' -and: 

(7) outputting a list of undeleted instoc^ of the organic 

fragment; 

provided that the organic fragment is not water. 

20 

11. The method according to claim 10* wherein the potential binding sites 
comprise an unbiased sampling of sites of the macromolecule. 

12. The method according to claim 1 0, wherein said positioning comprises 
25 imposing a grid on the computer readable representation of the 

macromolecule, wherein the potential binding sites comprise the grid 
intersection points. 



30 



13. The method according to claim 12, wherein the spacing of the grid is less 
than the smallest cross-section of the organic fragment. 



-28- 



14 The method according to clam 

fu^er coinprise points within a sphere sh^e centred at a grid point and 
having a diameter of 

5 15 The method according to claim 12, wherein the potential binding sites 

further comprise points within a rectangular solid shape centered at a grid 
point and having dimensions of about 10% of the grid spacing. 

16. The method according to claim 10, further comprising: 

10 (8) repeating steps (1) through (7) for one or more 

addition^ organic fragments. 

17. The method according to claim 10, further comprising: 

(8) repeating steps (1) through (6) wherein the organic 
15 fragment is a water molecule, wherein step (7) further comprises outputting a 

Ust of undeleted instanc that are not associated with 

undeleted instances of the water molecule. 

18. The method according to claim 16, further comprising: 
20 (9) ideating steps (1) through (6) wherein the organic 

fragment is a water molecule, wherein step (7) further comprises outputting a 
Ust of undeleted instances of the organic fragments that are not associated with 
undeleted instances of the water molecule. 

25 19. The method according to claim 1 0, further comprising: 

(8) determining potential binding sites from the list of 
undeleted instances of the organic fragment. 

20. The method according to claim 19, further comprising: - 
30 (9) selecting compounds exhibiting a functionality and 

relative orientation of the undeleted instances of the organic fragment; and 



-29- 



(10) conducting binding or functional assays to identify the 



21. The method according to claim 16/forttier (^mprisiri 

(9) detennining potential bm of 



22. The method according to claim 21, further comprising: 

(10) selecting compounds exhibiting functionalities and 
10 relative orientations (tf instances of the organic fragments; and 

(11) conduct or functional assays to identify the 



15 



23. The method according to claim 10, wherein step (3) comprises 
performing a grand-canonical Monte Carlo simulation. 



1/16 




5552 

■HI 
■HI 
■HI 
■■■I 



IS: 

in; 




'ft: ' .'r -^-a 

\V: ; V*'"'"" 



CD 

LL 



2/16 




FIG.2A 



3/16 




FIG.2B 



4/16 




FIG.2C 



6/16 




7/16 




8/16 




FIG.3C 



9/16 




FIG.3D 



10/16 




11/16 






,lft>.-,> 



CO 

d 



12/16 






CD 



< 
2 

u_ 



13/16 




14/16 




15/16 




16/16 




FIG.7 



INTERNATIONAL SEARCH REPORT 



international application No. 
PCT/US03/07366 



A. CLASSIFICATION OF SUBJECT MATTER 
IPC(7) ; G06F 19/00 
USCL ; : 702/22 

According to Interna tional Pa tent Classification (IPC) or (6 both national classification and IPC 



FIELDS SEARCHED 



Mmimiim doa im^tafi on, searched (classification, system followed by efassifa^nn lymhnls) 
U.S. : 702/22 



Documentation searched other than rrringrnmi docuinentation to the extent that such documents are included in the fields searched 



Electronic data base consulted during the international search (name of data base and, where practicable, search terms used) 
STN online 



C DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 8 



Citation of document, with indication, where appropriate, of the relevant passages 



Relevant to claim No, 



X 
Y 



GUARNCERI et al. Simulated amuMling of chemical potential: A general procedure for 
localizing bound waters. J. Amer. Chem. Soc,, 1996, Vo 118, 8493-8494 



MEZEI et al. Structural chemistry of biomolecular hydration via computer simulation; The 
proximity criterion. Methods Enzymology, 1986, Vol. 127, pages 21,22. • 



1,6,7 
1.2J 



1.2,7 



□ 

Further documents are listed in the connaiation of BoxC. Q See patent family annex. 



• Special categories of cited documents: 

•A* document defining the general state of the art Which U not considered to 
be of particular relevance 

*E" eartier application or patent published on or after the international Sling 
date . 

«L- document which may throw doubts on priority daiin<s) or which U cited 
to establish the publication date of another citation or other spcdel reason 
(at specified) 

*O m doenment referring to an oral disclosure, use, exhibition or other means 

•P* doenment published prior to the international filing date but later than the 
priority ri-rrr chtmrri 



T Lucr document publiihctf after liie International filing dale or 

priority date and not in conflict with the application but cited to 
understand the principle ot theory underlying the Invention 

"X" document of particular rdevance; the claimed^ 

considered novel or cannot be considered to Involve an inventive 
step when the document is taken alone 

*Y" documen t of particular relevance; me claimed invention cannot be 

considered to involve an Inventive step when the document Is 
combined with one or more other such documents, such 
combination being obvious to apcrson skilled in the art 

«dc" document member of the same patent family 



Date of the actual completion of the international search 
08 October 2003 (08. 30.2003) 



Date of 



search report 




Name and mailing address of the ISA/US 
Mail StopPCT, Attn: ISA/US 
Commissioner for Patents 
P^O. Box 1450 

Alexandria, Virginia 22313-1450 
Facsimile No. (703)305-3230 



Form PCT/ISA/210 (second sheet) (July 1998) 



INTERNATIONAL SEARCH REPORT 



International application No. 
PCTAJS03/07366 . 



Box I Observations where certain claims were found ansearchable (Gontinuation of Item 1 of first sheet) 



This international report has not been established in respect of certain claims under Article 17(2)(a) for the following reasons: 

1. Q Claim Nos.: 

because they relate to subject matter not required to be searched by this. Authority, namely: 



n OaimNos.: 

because they relate to parts of the international application that do not comply with die prescribed requirements to 
such an extent that no meaningful international search can be carried out, specifically: 



3- P 
6.4(a). 



Claim Nos.: 

because they are dependent claims and are not drafted in accordance with the ^ second and tod seiaences of Rule 



Box II Observations where unity of invention is lacking (Continuation of Item 2 of first sheet) 



This International Searching Authority found multiple inventions in this international application, as follows: 



1. Q As all required additional search fees were timely paid by die applicant, this international search report covers all 

searchable claims. 

2. [~j As all searchable claims could be searched without effort justifying an additional fee; mis Authority did not invite 

payment of any additional fee. 

As only some of the required additional search fees were timely paid by the applicant, this international search 
report covers only those claims for which fees were paH 



4. |2\J No required additional search fees were timely paid by the applicant Consequently, this international search report 
is restricted to die invention first mentioned in the claims; it is covered by claims Nos. : 1-7 



iProtest [~ 



The additional search fees were accompanied by the appticant *s protest 



I No protest accompanied the payment of additional search fees. 



Form PCMSA/210 (continuation of first sheet(l)) (July 1998) 



This Page is Inserted by IFW Indexing and Scanning 
Operations and is not part of the Official Record 



Defective images within this document are accurate representations of the original 
documents submitted by the applicant. 

Defects'mthe images include but are not limited to the items checked: 



LI BLACK BORDERS 

□ IMAGE CUT OFF AT TOP, BOTTOM OR SIDES 

□ FADED TEXT OR DRAWING 

□ BLURRED OR ILLEGIBLE TEXT OR DRAWING 

□ SKEWED/SLANTED IMAGES 



YS GRAY SCALE DOCUMENTS 

□ LINES OR MARKS ON ORIGINAL DOCUMENT 

□ REFERENCE(S) OR EXHIBIT(S) SUBMITTED ARE POOR QUALITY 

□ OTHER: 



IMAGES ARE BEST AVAILABLE COPY. 
As rescanning these documents will not correct the image 
problems checked, please do not report these problems to 
the IFW Image Problem Mailbox. 



BEST AVAILABLE IMAGES 





COLOR OR BLACK AND WHITE PHOTOGRAPHS 



